Adaptive critic designs: A case study for neurocontrol
نویسندگان
چکیده
-For the first time, different adaptive critic designs (ACDs), a conventional proportional integral derivative ( PID) regulator and backpropagation of utility are compared for the same control problem---automatic aircraft landing. The original problem proved to contain little challenge since various conventional and neural network techniques had solved it very well. After the problem had been made much more difficult by a change of parameters, increasingly better performance was observed by going from the simplest ACD to more sophisticated designs, dual heuristic programming having been ranked best of all. This case study is of use in general intelligent control problems for it provides an example of the capabilities of different adaptive critic designs. Keywords---Adaptive critic, Heuristic dynamic programming, Aircraft autolanding, Neurocontrol, Neural networks for control and optimization. 1. A D A P T I V E C R ITIC D E S I G N S ACDs are tools to solve difficult optimization problems (Werbos, 1990). They include heuristic dynamic p rogramming (HDP), dual heuristic programming (DHP) and globalized D H P (GDHP) as well as their action dependent forms, further referred to as having prefix AD. (We will use the four abbreviations: H D P , DHP, G D H P , and ACD, plus the prefix A D on the first two, for the remainder of this paper.) All designs at tempt to approximate dynamic programming. This gives ACDs their power to work effectively in a noisy, nonlinear environment while making minimal assumptions regarding the nature of that environment. A typical A C D consists of three neural ne t s critic, action and model. The critic net outputs a function J which is an approximation of the Acknowledgements: The authors I gratefully acknowledge support from the Texas Tech Center for Applied Automation and Research Grant entitled "Applied Computational Intelligence Laboratory", and the National Science Foundation Neuroengineering Program Grant No. ECS-9413120. DHP and related designs are patents pending 1994 by BehavHeuristics, Inc. and Paul Werbos. All rights are reserved. The authors also would like to thank Paul Werbos and Ken Otwell for their support and assistance. Furthermore, we are pleased to acknowledge the work of Chuck Jorgensen et al. in the original development of the autolander problem statement (Jorgensen and Schley, 1990). Requests for reprints should be sent to Danil Prokhorov, Box 43102, Department of Electrical Engineering, Texas Tech University, Lubbock, TX 79409-3102, USA. secondary utility function J* of dynamic programming (as in HDP, A D H D P , and G D H P ) or the derivatives of J* with respect to the state variables R (as in D H P and ADDHP) . This function has to be approximated because of intractable computat ional complexity in finding it. The goal is to maximise or minimize J in the immediate future (in the next time step) which produces an op t imum U, the pr imary utility function, in the long run. This goal is accomplished by the action network which outputs a control vector A that optimizes J. An adaptat ion of the action network is based on derivatives of J with respect to the components of the vector A. A straightforward way to get those derivatives is to use backpropagat ion through the other nets of the design. The use of the backpropagat ion algorithm to find the derivatives of J is the most crucial distinction between H D P and the well-known adaptive critic element (Barto et al., 1983). I f the environment or an object to be controlled does not allow backpropagat ion through itself, the model network is used. It mediates propagat ion of the derivatives of J from the critic network to the action network in H D P and, additionally, provides proper targets for the critic network 's adaptat ion in DHP. The model network forecasts the next states R ( t + l ) , R ( t + 2 ) , . . . , R ( t + N ) of the environment. These forecasts, rather than actual states, are fed into the critic provided that the model is sufficiently accurate. Whenever making one-step-ahead predic-
منابع مشابه
Speeding - Up Adaptive Heuristic Critic
Neurocontrol is a crucial area of fundamental research within the neural network eld. Adaptive Heuristic Critic learning is a key algorithm for real time adaptation in neurocontrollers. In this paper we present how an unsupervised neural network model with adaptable structure can be used to speed-up Adaptive Heuristic Critic learning, its FPGA design , and how it adapts the neurocontroller to t...
متن کاملAdaptive Critic Designs - Neural Networks, IEEE Transactions on
We discuss a variety of adaptive critic designs (ACD’s) for neurocontrol. These are suitable for learning in noisy, nonlinear, and nonstationary environments. They have common roots as generalizations of dynamic programming for neural reinforcement learning approaches. Our discussion of these origins leads to an explanation of three design families: Heuristic dynamic programming (HDP), dual heu...
متن کاملPartial, noisy and qualitative models for adaptive critic based neurocontrol
The roles of plant models in adaptive critic methods for approximate dynamic programming are considered, with primary focus given to the DHP methodology. In place of complete system identification, partial, approximate, and qualitative models of plant dynamics are considered. Such models are found to be sufficient for successful controller design. As classification is in general easier than reg...
متن کاملAdaptive Critic Based Approximate Dynamic Programming for Tuning Fuzzy Controllers
This work was supported by the National Science Foundation under grant ECS-9904378. Abstract: In this paper we show the applicability of the Dual Heuristic Programming (DHP) method of Approximate Dynamic Programming to parameter tuning of a fuzzy control system. DHP and related techniques have been developed in the neurocontrol context but can be equally productive when used with fuzzy controll...
متن کاملA comparison of training algorithms for DHP adaptive critic neurocontrol
A variety of alternate training strategies for implementing the Dual Heuristic Programming (DHP) method of approximate dynamic programming in the neuro-control context are explored. The DHP method of controller training has been successfully demonstrated by a number of authors on a variety of control problems in recent years, but no unified view of the implementation details of the method has y...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Neural Networks
دوره 8 شماره
صفحات -
تاریخ انتشار 1995